Failover systems and methods for performing backup operations

ABSTRACT

In certain embodiments, a tiered storage system is disclosed that provides for failover protection during data backup operations. The system can provide for an index, or catalog, for identifying and enabling restoration of backup data located on a storage device. The system further maintains a set of transaction logs generated by media agent modules that identify metadata with respect to individual data chunks of a backup file on the storage device. A copy of the catalog and transaction logs can be stored at a location accessible by each of the media agent modules. In this manner, in case of a failure of one media agent module during backup, the transaction logs and existing catalog can be used by a second media agent module to resume the backup operation without requiring a restart of the backup process.

CROSS-REFERENCE TO RELATED APPLICATION

Any and all applications for which a foreign or domestic priority claim is identified in the Application Data Sheet, or any correction thereto, are hereby incorporated by reference under 37 CFR 1.57. The present application is a continuation of U.S. patent application Ser. No. 13/958,353 filed Aug. 2, 2013, which is a continuation of U.S. patent application Ser. No. 12/982,165 filed Dec. 30, 2010, which claims the benefit of priority under 35 U.S.C. §119(e) of U.S. Provisional Patent Application No. 61/351,790, filed on Jun. 4, 2010, and entitled “FAILOVER SYSTEMS AND METHODS FOR PERFORMING BACKUP OPERATIONS,” each of which is hereby incorporated herein by reference in its entirety.

BACKGROUND

1. Field

The present disclosure relates to performing storage backup operations and, in particular, to systems and methods for providing failover during backup operations.

2. Description of the Related Art

Computers have become an integral part of business operations such that many banks, insurance companies, brokerage firms, financial service providers, and a variety of other businesses rely on computer networks to store, manipulate, and display information that is constantly subject to change. Oftentimes, the success or failure of an important transaction may turn on the availability of information that is both accurate and current. Accordingly, businesses worldwide recognize the commercial value of their data and seek reliable, cost-effective ways to protect the information stored on their computer networks.

To protect this stored data, network administrators can create backup copies of the stored information so that if the original data is destroyed or corrupted, the backup copy can be restored and used in place of the original data. For instance, the module storage architecture of the GALAXY backup system offered by CommVault Systems, Inc. (Oceanport, N.J.) advantageously provides for a multi-tiered storage management solution for backing up data. One drawback, however, with this and other conventional backup systems is that an interruption of the backup process can require the entire process to be restarted, thereby resulting in a loss of valuable time and resources, especially for large backup operations.

SUMMARY

In view of the foregoing, a need exists for improved systems and methods for performing backup operations. For example, there is a need for failover systems and methods for backing up data in a storage environment. In yet other embodiments, a need exists for load balancing between modules tasked with performing the backup operations and indexing the data such that when one module fails or is overloaded, another module can continue the process in place of the failed module.

In certain embodiments of the invention, a tiered storage system is disclosed that provides for failover protection during data backup operations. In certain embodiments, the system provides for an index, or catalog, for identifying and enabling restoration of backup data located on a storage device. The system further maintains a set of transaction logs generated by media agent modules that identify metadata with respect to individual data chunks of a backup file on the storage device. A copy of the catalog and transaction logs can be stored at a location accessible by each of the media agent modules. In this manner, in case of a failure of one media agent module during backup, the transaction logs and existing catalog can be used by a second media agent module to resume the backup operation without requiring a restart of the backup process.

In certain embodiments, a method is disclosed for performing a backup operation in a storage system. The method comprises: receiving with a first media agent module executing on a first computing device a plurality of data chunks to backed up on a storage device as a single backup file; storing a first data chunk of the plurality of data chunks in a backup format on the first storage device; generating a first transaction log comprising metadata of the first data chunk; transmitting the first transaction log to a second computing device; storing a second data chunk of the plurality of data chunks in the backup format on the first storage device; generating a second transaction log comprising metadata of the second data chunk; and transmitting the second transaction log to the second computing device, wherein said transmitting the first and second transaction logs is performed prior to the entire backup file being stored on the storage device.

In certain embodiments, a storage system is disclosed for performing backup operations in a network environment. The storage system includes a storage device, a first media agent module and an index. The storage device is configured to store backup data. The first media agent module executes on a first computing device and is communicatively coupled to the storage device. The first media agent module is also configured to direct first backup operations on the storage device. Then index is maintained on a second computing device and is indicative of at least locations of contents of the backup data stored on the storage device. Moreover, the first media agent is configured to: add data chunks to the backup data, the data chunks being part of a backup file to be stored on the storage device; for data each chunk added to the backup data on the storage device, generate a transaction log associated with the data chunk and comprising data for restoring one or more objects from the data chunk, and transmit the transaction logs to the second computing device prior to the entire backup file being stored on the storage device.

For purposes of summarizing the disclosure, certain aspects, advantages and novel features of the inventions have been described herein. It is to be understood that not necessarily all such advantages may be achieved in accordance with any particular embodiment of the invention. Thus, the invention may be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other advantages as may be taught or suggested herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a block diagram of a backup system according to certain embodiments of the invention.

FIG. 2 illustrates a flow chart of an exemplary embodiment of a catalog creation process usable by the backup system of FIG. 1.

FIG. 3 illustrates a flow chart of an exemplary embodiment of a failover backup process usable by the backup system of FIG. 1.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

As will be seen from the disclosure herein, certain embodiments of systems and methods are provided for enabling failover during a backup operation. In particular, embodiments of the invention include creating a catalog, or index, of individual objects or files within backup data on a storage device. Inventive systems can also include media agent modules, or other backup components, that further generate a set of transaction logs that identify metadata with respect to new data objects being stored to the backup data. A copy of the catalog and transaction logs can be stored at a location accessible by multiple media agent modules. As a result, if one media agent fails during a backup operation, a second media agent can access the transaction logs and the existing catalog to resume the backup operation without requiring a restart of the backup process. Such embodiments can also provide means for enabling load balancing or like rotation of media agent modules in completing a common backup operation.

The features of the systems and methods will now be described with reference to the drawings summarized above. Throughout the drawings, reference numbers are re-used to indicate correspondence between referenced elements. The drawings, associated descriptions, and specific implementation are provided to illustrate embodiments of the invention and not to limit the scope of the disclosure.

In addition, methods and functions described herein are not limited to any particular sequence, and the acts or blocks relating thereto can be performed in other sequences that are appropriate. For example, described acts or blocks may be performed in an order other than that specifically disclosed, or multiple acts or blocks may be combined in a single act or block.

FIG. 1 illustrates a block diagram of a backup system 100, according to certain embodiments of the invention. In general, the backup system 100 comprises a modular (or tiered) architecture that provides for failover during a backup operation. For example, the backup system 100 can maintain a central catalog, or index, and one or more transaction logs usable to identify and/or restore backup data on a storage device.

As shown, the backup system 100 comprises at least one storage device 102 for storing backup data 104. The storage device 102 may include any type of media capable of storing electronic data, such as, for example, magnetic storage (such as a disk or a tape drive), optical media, or other type of mass storage. In certain embodiments, the storage device 102 can be part of a storage area network (SAN), a Network Attached Storage (NAS), a virtual machine disk, combinations of the same or the like.

In certain embodiments, the storage device(s) 102 may be implemented as one or more storage “volumes” that include physical storage disks defining an overall logical arrangement of storage space. For instance, disks within a particular volume may be organized as one or more groups of redundant arrays of independent (or inexpensive) disks (RAID). In certain embodiments, the storage device(s) 102 may include multiple storage devices of the same or different media.

Storage of the backup data 104 to the storage device 102 is performed by media agent modules or devices 106 a and 106 b (collectively referred to by reference numeral “106”). In general, the media agent devices 106 comprise storage controller computers that serve as intermediary devices and/or means for managing the flow of data from, for example, client information stores to individual storage devices. For instance, the media agent 106 can comprise a module that conducts data between one or more source devices, such as a client computing device, and the storage device(s) 102.

In certain embodiments, the media agents 106 store the backup data 104 on the storage device 102 as a plurality of data chunks. The terms “chunk” and “data chunk” as used herein are broad terms and are used in their ordinary sense and include, without limitation, a portion of data having a payload and encapsulated with metadata describing the contents of the payload placed in a tag header of the chunk. In certain embodiments, a chunk represents the smallest restorable component (e.g., 512 megabytes) of an archive or backup file.

In certain embodiments, the media agent 106 is communicatively coupled with and controls the storage device 102. For example, the media agent 106 may instruct the storage device 102 to use a robotic arm or other means to load or eject a media cartridge, and/or to archive, migrate, or restore application-specific data. In certain embodiments, the media agent 106 communicates with the storage device 102 via a local bus, such as a Small Computer System Interface (SCSI) adaptor. In some embodiments, the storage device 102 is communicatively coupled to the media agent 106 via a SAN.

Each media agent 106 can further maintain an index cache that stores index data generated during backup, migration, and restore operations as further described herein. Such index data provides the backup system 100 with an efficient and intelligent mechanism for locating backed up objects and/or files during restore or recovery operations. For example, the index data can include metadata such as file/object name(s), size, location, offset, checksum and the like of backup data 104 stored on the storage device 102.

Once a backup operation is complete, the index data is generally stored as an index 108 with the data backed up to the storage device 102. This advantageously facilitates access to the files and/or objects within the backup data when performing a restore operation. However, with conventional backup systems, in the event that there is a failure during backup of the data 104, a complete and accurate representation of the backed up data is not stored on the storage device 102. Thus, such failures oftentimes result in a restarting of the backup process and a re-creation of the index data.

To provide for failover during backup operations, the media agents 106 of the backup system 100 are further configured to generate one or more transaction logs for each data chunk backed up to the storage device 102. Such transaction logs can maintain similar information as entries of the index 108 (e.g., object name, size offset, length, checksum, time stamp, combinations of the same or the like). Once a particular data chunk is committed to, or stored on, the storage device 102, the corresponding transaction log(s) are uploaded or transmitted on-the-fly to a main index, or catalog, 110.

The catalog 110, in certain embodiments, represents a copy of the most recent index 108 stored with the backup data 104 on the storage device 102. Like the index 108, the catalog 110 entries contain sufficient information to restore one or more files or blocks from the last completed backup operation. When used in combination with uploaded transaction logs, the catalog 110 can be advantageously used to resume a backup operation that terminates prematurely or otherwise interrupted, such as from a failure of a media agent 106.

The catalog 110 is advantageously accessible to each of the media agents 106 such that if a first media agent (e.g., media agent 106 a) fails while performing a backup operation, a second media agent (e.g., media agent 106 b) can access the catalog 110 and resume the backup operation in place of the first media agent. For instance, in certain embodiments, the catalog 110 can be stored on a server or other computing device separate from the media agents 106. In yet other embodiments, the catalog 110 can be maintained by a storage manager 112. It will also be appreciated that catalog 110 can represent a computing device, such as a server computer, that maintains the catalog or index.

In certain embodiments, the storage manager 112 comprises a module or application that coordinates and controls storage, migration, recovery and/or restore operations within the backup system 100. For instance, such operations can be based on one or more storage policies, schedules, user preferences or the like. As shown, the storage manager 112 can communicate with each of the media agents 106 and the catalog 110. In yet further embodiments, the storage manager 112 can communicate with the storage device(s) 102.

Although the backup system 100 is shown and described with respect to particular arrangements, it will be understood from the disclosure herein that other embodiments of the invention can take on different configurations. For instance, the backup system 100 can comprise a plurality of media agent modules or devices that each communicate with one or more storage devices and/or one or more client devices.

Furthermore, components of the backup system 100 can also communicate with each other via a computer network. For example, the network may comprise a public network such as the Internet, virtual private network (VPN), token ring or TCP/IP based network, wide area network (WAN), local area network (LAN), an intranet network, point-to-point link, a wireless network, cellular network, wireless data transmission system, two-way cable system, interactive kiosk network, satellite network, broadband network, baseband network, combinations of the same or the like.

FIG. 2 illustrates a flow chart of a catalog creation process 200 according to certain embodiments of the invention. For instance, the process 200 can be advantageously used to maintain a catalog or main index of metadata usable to restore backed up data and resume a backup operation following a premature failure of a backup component. For exemplary purposes, the process 200 will be described with reference to the components of the backup system 100 of FIG. 1.

At Block 205, the process 200 begins a backup operation performed by a media agent device 106. For example, the storage manager 112 may instruct the media agent device 106 to backup data relating to one or more applications executing on one or more client computing devices. As discussed, in certain embodiments, the media agent 106 a stores the backup data 104 on the storage device 102 in a chunk-by-chunk manner.

In certain embodiments, the media agent device 106 receives the data to be backed up from one or more data agents operating on a client device. In certain examples, the data can comprise application-specific data or can include data streams with multiple data types or objects contained therein.

At Block 210, the media agent device 106 processes a data chunk of the received data to be backed up. In certain embodiments, such processing includes generating metadata indicative of the contents and/or attributes of the objects within the data chunk or of the data chunk itself, as well as information regarding the storage location of such objects or files on the storage device 102 (e.g., with the backup data 104).

The media agent device 106 then backs up the data chunk to the backup file 104 on the storage device 102 (Block 215). The media agent device 106 also uploads one or more transaction logs to the catalog 110 that contain the above-described metadata for the backed up data chunk (Block 220). In certain embodiments, a single transaction log corresponds to a single data chunk.

At Block 225, the process 200 determines if there are additional data chunks as part of the backup operation. If so, the process 200 returns to Block 210 to process the next data chunk. If not, the process 200 proceeds with Block 230 to store the index 108 with the backup data 104. In certain embodiments, the index 108 allows for restoring individual objects and/or files from the backup data 104. The process 200 also includes applying the uploaded transaction logs to the catalog 110 so that the catalog 110 contains up-to-date information reflecting the contents of the entire backup file 104 (Block 235).

It will be appreciated that the process 200 is not limited to the arrangement of blocks illustrated in FIG. 2. For example, in other embodiments, the transaction log(s) may be uploaded (Block 220) prior to, or concurrent with, the storage of the corresponding data chunks on the storage device 102.

FIG. 3 illustrates a flow chart of a failover backup process 300 according to certain embodiments of the invention. For instance, the process 300 can be used to transfer control of a backup operation from a first storage controller component to a second storage controller component, such as during a failure or for load balancing. In certain embodiments, the process 300 illustrates a failover method that is possible in a system utilizing the catalog creation process 200 of FIG. 2. For exemplary purposes, the process 300 will be described hereinafter with reference to the components of the backup system 100 of FIG. 1.

The process 300 begins at Block 305 by initiating a backup operation with the first media agent 106 a. At Block 310, the process 300 detects a failure of the first media agent 106 a. For instance, in certain embodiments, the storage manager 112 can detect that the first media agent 106 a has prematurely ceased performing the backup operation. In one embodiment, the failure of the first media agent 112 causes the backup operation to fail, and during the next system restart, the storage manager 112 detects the failure of the first media agent 106 a.

Upon detecting failure of the first media agent 106 a, the process 300 obtains a copy of the index associated with the last complete backup (Block 315). For example, the storage manager 112 can instruct the second media agent 106 b to retrieve a copy of the index 108 from the storage device 102, the catalog 110 (or a computing device maintaining the catalog 110) or the like. In certain embodiments, the retrieved index contains information for retrieving objects and/or files that were stored on the storage device 102 prior to the commencement of the current backup operation (e.g., the most recently completed full backup).

At Block 320, the second media agent 106 b also retrieves a copy of the transaction log(s) associated with the interrupted backup operation by the first media agent 106 a. In certain embodiments, the transaction logs are stored on the catalog server 110 as a result of Block 220 of the process 200. For instance, the storage manager 112 may instruct that the transaction logs be sent to the second media agent 106 b along with instructions to the second media agent 106 b to take over the interrupted backup operation.

At Block 325, the second media agent 106 b applies the transaction logs to the retrieved index to the point that reflects where in the backup process the first media agent 106 a failed. The second media agent 106 b is then able to resume the backup operation without needing to repeat the backup of data that was performed by the first media agent 106 a (Block 330). For instance, the second media agent 106 b can continue backing up the data according to the process 200 depicted in FIG. 2.

Although the process 300 has been described with respect to detecting a failure of a media agent device, other embodiments of the invention can utilize similar steps to achieve load balancing or other selective use of multiple media agents during a single backup operation. For example, at Block 310, the storage manager 112 or other component can determine if the first media agent 106 a is operating under unbalanced and/or excessive load. Such an embodiment allows for the second media agent 106 b to take over the backup operation prior to a failure of the first media agent 106 a. For instance, the storage manager 112 can monitor bandwidth usage, a jobs queue and/or a schedule of the first media agent 106 a to evaluate its load.

In certain embodiments of the invention, the backup operations disclosed herein can be used to copy data of one or more applications residing on and/or being executed by a computing device. For instance, the applications may comprise software applications that interact with a user to process data and may include, for example, database applications (e.g., SQL applications), word processors, spreadsheets, financial applications, management applications, e-commerce applications, browsers, combinations of the same or the like. For example, in certain embodiments, the applications may comprise one or more of the following: MICROSOFT EXCHANGE, MICROSOFT SHAREPOINT, MICROSOFT SQL SERVER, ORACLE, MICROSOFT WORD and LOTUS NOTES.

Moreover, in certain embodiments of the invention, data backup systems and methods may be used in a modular storage management system, embodiments of which are described in more detail in U.S. Pat. No. 7,035,880, issued Apr. 5, 2006, and U.S. Pat. No. 6,542,972, issued Jan. 30, 2001, each of which is hereby incorporated herein by reference in its entirety. For example, the disclosed backup systems may be part of one or more storage operation cells that includes combinations of hardware and software components directed to performing storage operations on electronic data. Exemplary storage operation cells usable with embodiments of the invention include CommCells as embodied in the QNet storage management system and the QiNetix storage management system by CommVault Systems, Inc., and as further described in U.S. Pat. No. 7,454,569, issued Nov. 18, 2008, which is hereby incorporated herein by reference in its entirety.

Systems and modules described herein may comprise software, firmware, hardware, or any combination(s) of software, firmware, or hardware suitable for the purposes described herein. Software and other modules may reside on servers, workstations, personal computers, computerized tablets, PDAs, and other devices suitable for the purposes described herein. Software and other modules may be accessible via local memory, via a network, via a browser, or via other means suitable for the purposes described herein. Data structures described herein may comprise computer files, variables, programming arrays, programming structures, or any electronic information storage schemes or methods, or any combinations thereof, suitable for the purposes described herein. User interface elements described herein may comprise elements from graphical user interfaces, command line interfaces, and other interfaces suitable for the purposes described herein.

Embodiments of the invention are also described above with reference to flow chart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products. It will be understood that each block of the flow chart illustrations and/or block diagrams, and combinations of blocks in the flow chart illustrations and/or block diagrams, may be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the acts specified in the flow chart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the acts specified in the flow chart and/or block diagram block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the acts specified in the flow chart and/or block diagram block or blocks.

While certain embodiments of the inventions have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the disclosure. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the disclosure. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the disclosure. 

What is claimed is:
 1. A method for performing an operation in a data storage system, the method comprising: with a first computing device comprising one or more hardware processors: receiving a plurality of data units from a client computing device to store on at least one first storage device as part of a data protection operation; storing at least one data unit of the plurality of data units on the at least one first storage device; and prior to completion of the data protection operation, storing metadata to at least one second storage device in association with said storing of the at least one data unit and prior to, concurrently with, or subsequent to said storing of the at least one data unit; and with a second computing device comprising one or more hardware processors: receiving an instruction to take over control of the data protection operation; obtaining the metadata associated with the storing of the at least one data unit from the at least one second storage device; using at least the metadata, determining a point in the data protection operation at which the first computing device ceased performance of the data protection operation; and using the determined point in the data protection operation to resume performance of the data protection operation at least partly by storing on the at least one first storage device at least one other data unit which has not yet been stored on the at least one first storage device, and without repeating the storage of the at least one data unit on the at least one first storage device.
 2. The method of claim 1, further comprising, with the second computing device and prior to resuming performance of the data protection operation: obtaining an index associated with the data protection operation; applying at least the metadata to the index to create an updated index; and storing the updated index.
 3. A data storage system for performing an operation, comprising: a first computing device comprising one or more hardware processors, wherein the first computing device is configured to: receive a plurality of data units from a client computing device to store on at least one first storage device as part of a data protection operation; store at least one data unit of the plurality of data units on the at least one first storage device; and prior to completion of the data protection operation, storing metadata to at least one second storage device in association with said storing of the at least one data unit and prior to, concurrently with, or subsequent to said storing of the at least one data unit; and a second computing device comprising one or more hardware processors, wherein the second computing device is configured to: receive an instruction to take over control of the data protection operation; obtain the metadata associated with the storing of the at least one data unit from the at least one second storage device; using at least the metadata, determine a point in the data protection operation at which the first computing device ceased performance of the data protection operation; and using the determined point in the data protection operation to resume performance of the data protection operation at least partly by storing on the at least one first storage device at least one other data unit which has not yet been stored on the at least one first storage device, and without repeating the storage of the at least one data unit on the at least one first storage device.
 4. The data storage system of claim 3, wherein the second computing device is further configured to, prior to resuming performance of the data protection operation: obtain an index associated with the data protection operation; apply at least the metadata to the index to create an updated index; and store the updated index.
 5. A non-transitory computer readable medium comprising code that, when executed, causes: a first computing device comprising one or more hardware processors to: receive a plurality of data units from a client computing device to store on at least one first storage device as part of a data protection operation; store at least one data unit of the plurality of data units on the at least one first storage device; and prior to completion of the data protection operation, storing metadata to at least one second storage device in association with said storing of the at least one data unit and prior to, concurrently with, or subsequent to said storing of the at least one data unit; and a second computing device comprising one or more hardware processors to: receive an instruction to take over control of the data protection operation; obtain the metadata associated with the storing of the at least one data unit from the at least one second storage device; using at least the metadata, determine a point in the data protection operation at which the first computing device ceased performance of the data protection operation; and using the determined point in the data protection operation to resume performance of the data protection operation at least partly by storing on the at least one first storage device at least one other data unit which has not yet been stored on the at least one first storage device, and without repeating the storage of the at least one data unit on the at least one first storage device.
 6. The computer readable medium of claim 5, wherein the code further causes the second computing device to, prior to resuming performance of the data protection operation: obtain an index associated with the data protection operation; apply at least the metadata to the index to create an updated index; and store the updated index. 